Retrieving Domain-Specific Collocations by Co-occurrences and Word Order Constraints

نویسندگان

Sayori Shimohata

Toshiyuki Sugio

Junji Nagata

چکیده

In this paper, we describe a method for automatically retrieving collocations from large text corpora. This method comprises the following stages: (1) extracting strings of characters as units of collocations, and (2) extracting recurrent combinations of strings as collocations. Through this method, various types of domain-specific collocations can be retrieved simultaneously. This method is practical because it uses plain text with no specific-languagedependent information, such as lexical knowledge and parts of speech. Experimental results using English and Japanese text corpora show that the method is equally applicable to both languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Retrieving Collocations by Co-occurrences and Word Order Constraints

In this paper, we describe a method for automatically retrieving collocations from large text corpora. This method retrieve collocations in the following stages: 1) extracting strings of characters as units of collocations 2) extracting recurrent combinations of strings in accordance with their word order in a corpus as collocations. Through the method, various range of collocations, especially...

متن کامل

Retrieving Collocations from Text: Xtract

Natural languages are full of collocations, recurrent combinations of words that co-occur more often than expected by chance and that correspond to arbitrary word usages. Recent work in lexicography indicates that collocations are pervasive in English; apparently, they are common in all types of writing, including both technical and nontechnical genres. Several approaches have been proposed to ...

متن کامل

A Three-Layered Collocation Extraction Tool and Its Application in China English Studies

We design a three-layered collocation extraction tool by integrating syntactic and semantic knowledge and apply it in China English studies. The tool first extracts peripheral collocations in the frequency layer from dependency triples, then extracts semi-peripheral collocations in the syntactic layer by association measures, and last extracts core collocations in the semantic layer with a simi...

متن کامل

Retrieving Collocations From Korean Text

This paper describes a statistical methodology ibr automatically retrieving collocations from POS tagged Korean text using interrupted bigrams. The free order of Korean makes it hard to identify collocations. We devised four statistics, 'frequency', 'randomness', 'condensation', and 'correlation' .to account for the more flexible word order properties of Korean collocations. We extracted meanin...

متن کامل

A Corpus-Based Tool for Exploring Domain-Specific Collocations in English

Coxhead’s (2000) Academic Word List (AWL) has been frequently used in EAP classrooms and re-examined in light of various domain-specific corpora. Although well-received, the AWL has been criticized for ignoring the fact that words tend to show irregular distributions and be used in different ways across disciplines (Hyland and Tse, 2007). One such difference concerns collocations. Academic word...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Computational Intelligence

دوره 15 شماره

صفحات -

تاریخ انتشار 1999

Retrieving Domain-Specific Collocations by Co-occurrences and Word Order Constraints

نویسندگان

چکیده

منابع مشابه

Retrieving Collocations by Co-occurrences and Word Order Constraints

Retrieving Collocations from Text: Xtract

A Three-Layered Collocation Extraction Tool and Its Application in China English Studies

Retrieving Collocations From Korean Text

A Corpus-Based Tool for Exploring Domain-Specific Collocations in English

عنوان ژورنال:

اشتراک گذاری